Practical Batch-Updatable External Hashing with Sorting
نویسندگان
چکیده
This paper presents a practical external hashing scheme that supports fast lookup (7 microseconds) for large datasets (millions to billions of items) with a small memory footprint (2.5 bits/item) and fast index construction (151 K items/s for 1-KiB key-value pairs). Our scheme combines three key techniques: (1) a new index data structure (Entropy-Coded Tries); (2) the use of sorting as the main data manipulation method; and (3) support for incremental index construction for dynamic datasets. We evaluate our scheme by building an external dictionary on flash-based drives and demonstrate our scheme’s high performance, compactness, and practicality.
منابع مشابه
Accelerating External Search with Bitstate Hashing
In this paper we refine external exploration for explicit state model checking by a fusion with internal bitstate hashing. External A* provides a method to cope up with large state spaces by efficiently utilizing secondary storage devices like harddisk to maintain the open and closed lists. Duplicates are removed by a two-level refinement scheme that involves sorting a subset of the open list e...
متن کاملExternal Facelist Calculation with Data-Parallel Primitives
External facelist calculation on three-dimensional unstructured meshes is used in scientific visualization libraries to efficiently render the results of operations such as clipping, interval volumes, and material boundaries. With this study, we consider the external facelist algorithm on many-core architectures. We introduce four different approaches: three based on hashing and one based on so...
متن کاملOnline Supervised Hashing for Ever-Growing Datasets
Supervised hashing methods are widely-used for nearest neighbor search in computer vision applications. Most state-of-the-art supervised hashing approaches employ batch-learners. Unfortunately, batch-learning strategies can be inefficient when confronted with large training datasets. Moreover, with batch-learners, it is unclear how to adapt the hash functions as a dataset continues to grow and ...
متن کاملPrivacy-Preserving Access of Outsourced Data via Oblivious RAM Simulation
Suppose a client, Alice, has outsourced her data to an external storage provider, Bob, because he has capacity for her massive data set, of size n, whereas her private storage is much smaller—say, of size O(n), for some constant r > 1. Alice trusts Bob to maintain her data, but she would like to keep its contents private. She can encrypt her data, of course, but she also wishes to keep her acce...
متن کاملOblivious RAM Revisited
We reinvestigate the oblivious RAM concept introduced by Goldreich and Ostrovsky, which enables a client, that can store locally only a constant amount of data, to store remotely n data items, and access them while hiding the identities of the items which are being accessed. Oblivious RAM is often cited as a powerful tool, which can be used, for example, for search on encrypted data or for prev...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013